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Multi-issue processor 



TECHNICAL FIELD 

The present invention relates to a itnulti-issue processor comprising: a plurality 
of issue slots, each one of the plurality of issue slots comprising a plurality of fimctional units 
and a plurality of holdable registers, the plurality of issue slots comprising a first set of issue 
5 slots and a second set of issue slots; and a register file accessible by the plurality of issue 
slots. 

BACKGROUND ART 

Multi-issue processors exhibit a lot of parallel hardware to enable the 

10 concurrent execution of multiple operations m a single processor cycle and thxis exploituxg 
instruction-level parallelism in programs. Examples of multi-issue processors are VLIW 
(Very Large Instruction Word) processors and superscalar processors. In case of a VLIW 
processor, the software prograni contains full infomiation regarding which operations should 
be executed in parallel and these operations are packed into one very long instmction. The 

1 5 compiler ensures that all dependencies between operations are respected and that no resource 
conflicts can occur. Apart firom this program information the hardware does not require any 
additional information to correctly execute the program, which results in relatively simple 
hardware. In case of a superscalar processor the software to be executed is presented as a 
program composed of a sequential series of operations. The processor hardware itself 

20 determines at runtime which operation dependencies exist and decides which operations to 
execute in parallel based on these dependencies, while ensuring that no resource conflicts 
will occur. A relatively simple compiler suffices for translating a hi^-level programming 
language to sequential code, but the processor hardware is very complex. 

In multi-issue processors, the parallel hardware responsible for executing these 

25 operations is organized in issue slots. Each issue slot contains one or more functional units 
tihiat perform the actual operations. Commonly, in every processor cycle a smgle operation is 
started on one functional unit in every issue slot. In some processors more than one 
functional unit is put in an issue slot as a trade-off between maximum available parallehsm 
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and ixistruction width cost, in case of a VLIW processor, or hardware complexity, in case of a 
superscalar processor. 



functional unit in each issue slot, power may be wasted by functional units in that issue slot 
that are not being used in a given processor cycle. If the input of these functional units 
changes during the time that they are not used tiiey will still consume comparable power to 
when they are being used, even though their output is irrelevant 



register, the state of which remains unchanged in case of a different input, at the inputs of all 
functional units within an issue slot. These holdable registers will leave the inputs of the 
functional xmits unchanged, when these functional units are not being used. Since the inputs 
of these functional imits remain unchanged, no combinatorial gates are switched and no 
dynamic power dissipation occurs. These holdable registers can be implemented, for 
example, by means of clock gating. Another advantage of these registers is that the additional 
pipeline stage they are forming allows running the processor at a higher clock frequency. A 
disadvantage of adding registers to all inputs of functional unit inputs is that it increases the 
amount of state that must be saved during interrupts. An interrupt allows a processor to 
quickly respond to external events and it causes the processor to temporarily postpone the 
further execution of the current program trace and instead perform another trace. The state of 
the postponed trace must be saved such that, when the interrupt has been serviced, the 
processor can restore its original state and can correctly proceed with the original trace. In 
order to obtain a predictable and short interrupt latency, it must always be possible to 
interrupt the processor whenever desired. This is especially important in real-time 
applications. Interrupting a processor at an arbitrary point in the program can imply that a 
significant amount of state must be saved. 

The non-prepublished European patent q)plication 00203591.3 [attorneys* 
docket PHNL000576], filed on 18.10.2000, provides a solution for decreasing the amount of 
state that must be saved during interrupts. A second compact instruction set is applied, that is 
used in an intermpt service routine and only uses a limited set of processor resources. In case 
of an intermpt, it is sufiGicient to save the state of only the limited set of processor resources 
used by the second compact instruction set, while simply freezing the state in all other 
resources. However, the resources used by the second compact instruction set still have a 
considerable amount of state that must be saved and restored during interrupts, when registers 
are put at all the inputs of each functional unit in this limited set of resources. 



Since in each clock cycle at most one operation can be started on one 



This waste of power can be eliminated by putting holdable registers, i.e. a 
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DISCLOSURE OF INVENTION 

An object of this invention is to provide a solution to further reduce the 
amount of state that must be saved during interrapt handling for multi-issue processors, while 
maintaining a significant reduction in power consumption and improved performance. 

This object is achieved with a multi-issue processor of the kind set forth 
characterized in that a location of at least a part of the plurality of holdable registers in the 
first set of issue slots is different from a location of at least a correspondiag part of the 
plurality of holdable registers in the second set of issue slots 

Ideally, the holdable registers are put at all inputs of each functional unit 
within an issue slot. In that case it is guaranteed that each hiput of a functional unit, that is 
not beiag used, will remain imchanged and no imnecessary power dissipation will occur. 
However, this increases the amount of state that has to be saved dxiring interrapt handling. By 
varying the position of the holdable registers for different issue slots, and not puttiag a 
holdable register in fix>nt of all inputs of every functional unit, less state saving is required 
during interrupt handling. This may result in a lower reduction of the power consumption or a 
reduced increase in performance. Depending on the type of application an optimal choice 
between these demands can be made. 

An embodiment of the invention is characterized in that the multi-issue 
processor further comprises a first instruction set means having access to the first set of issue 
slots and a second instraction set means having access to the second set of issue slots. An 
advantage of this embodiment is that the location of the holdable registers in an issue slot can 
be made dependent of the instruction set means that controls this issue slot. If the second 
instraction set means is used in an interrapt service routine, the holdable registers in the 
second set of issue slots can be positioned to optimally reduce the amoimt of state that must 
be saved during interrapt handling. However, this solution is not optimal for reduction of the 
power consumption. The positioning of the holdable registers still creates an additional 
pipeline stage enabling an increase in the clock frequency of the processor. Many interrapts 
require very simple interrapt service routines and therefore a compact second instraction set 
using a limited set of issue slots is sufficient. Therefore the non-optimal reduction in power 
consumption only holds for a small set of issue slots within the multi-issue processor. The 
first set of issue slots is not used during interrapt handling and as a result their state does not 
have to be saved. The holdable registers can be placed to optimally reduce the power 
consumption and increasing the clock frequency by creating an additional pipeline stage. For 
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the overall processor this results in a well-balanced consideration between increasuig 
performance, decreasing power consumption and reducing state saving overhead. 

An embodiment of the invention is characterized in that in the fiurst set of issue 
slots the location of the plurality of holdable data registers is at individual data inputs of the 
S functional units, while in the second set of issue slots the location of the plurality of holdable 
data registers is at common data inputs of the functional units. An advantage of this 
embodiment is that the amount of state that has to be saved during intenupt handling is 
strongly reduced, since the holdable registers are not positioned at all individual inputs of the 
functional units of the second set of issue slots, but only at their common inputs. However, 

10 the use of one fimctional unit of an issue slot of the second set of issue slots results in 
changing inputs at the other functional imits of that issue slot and therefore causes 
unnecessary power dissipation. In case that entire issue slot is not being used, the functional 
units will consume no power. In the first set of issue slots the holdable registers are 
positioned at all inputs of the functional units to optimally reduce power consumption, 

1 5 resulting in a significant overall reduction in the power consumption. Furthermore, the 

holdable registers in the first and second set of issue slots form an additional pipeline stage in 
the architecture, allowing the processor to run at a higher clock firequency. As a result, a good 
compromise is obtained between reduction in power consumption, increase in performance 
and reduction in the amount of state that has to be saved during interrupt handling. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

The features of the described embodiments will be further elucidated and 
described with reference to the drawings: 

Fig. 1 is a schematic diagram of a VLIW processor. 
25 Fig. 2 is a schematic diagram of issue slot UCi, UC2 and UC3 only used by a 

first instruction set. 

Fig. 3 is a schematic diagram of issue slot UCo used by a second instruction 
set, during interrupt handling. 

30 DESCRIPTION OF PREFERRED EMBODIMENTS 

Referring to Fig. 1, a schematic block diagram illustrates a VLIW processor 
comprising a plurality of issue slots, including issue slots, UCo, UCi, UC2 and UC3, and a 
distributed register file including register file segments RFo and RFi. The processor has a 
controller SQ and a connection network CN for coupling the register file segments RFo and 
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RFi, and the issue slots UCo, UCi, UC2 and UC3. The issue slots UCo, UCi, UC2 and UC3 are 
used by a first instruction set and this first instruction set includes the normal VLIW 
instructions. The issue slot UCo is the only issue slot that is used by a second instruction set. 
This second instruction set is used in an interrupt service routine. 



and UC3. Referring to Fig. 3, a schematic block diagram illustrates issue slot UCo- Referring 
now to both Fig, 2 and Fig. 3, each issue slot coniprises a decoder DEC, a time shape 
controller TSC, an input routing network IRN, an output routing network ORN, and a 
plurality of functional units, including functional units FUo, FUi and FU2. The decoder DEC 

10 is coupled to the time shape controller TSC and to the functional units FUo, FUi and FU2. 
The input routing network IRN is coupled to the functional units FUq, FUi and FU2. The 
output routing network ORN is also coupled to the functional units FUo, FUi and FU2. The 
decoder DEC decodes the operation O applied to the issue slot in each clock cycle. Results of 
the decoding step are operand register indices ORI and the decoder DEC passes these indices 

15 to tiie connection network CN, shown in Fig. 1 . Further results of the decoding step are result 
file indices RFI and register indices RI. The decoder DEC passes these indices to the time 
shape controller TSC. The tune shape controller TSC delays the result file indices RFI and 
the register indices RI by the propCT amount, according to the input/output behavior of the 
functional unit on which tiie operation must be executed. Subsequently, the time shape 

20 controller TSC passes the result file indices RFI and the register indices RI to the connection 
network CN, shown in Fig. 1. The decoder DEC also selects one of tihie functional units FUo, 
FUi and FU2 to perform an operation, using the coupling SEL. Furthermore, the decoder 
DEC passes information on the type of operation that has to be performed to the functional 
units FUo, FUi and FU2, usmg the coupling OPT. The input routing network mN passes the 

25 operand data OD for the issue slot UCi, UC2 and UC3 to the inputs of functional xmits FUo, 
FUi and FU2. The functional units FUq, FUi and FU2 pass their output data to the output 
routing network ORN and subsequently the output routing network ORN passes the result 
data RD to the conmiunication network CN, see Fig. 1. 



30 and control inputs of the functional units FUo, FUi and FU2. Holdable registers 1-5,11-15, 
21 and 23 are referred to as holdable data registers, since they are positioned at the data 
inputs of tiie functional units FUo, FUi and FU2. The holdable registers 1-27 will leave tiae 
inputs of the functional units FUq, FUi and FU2 unchanged when a functional unit is not 
being used. As a result, no combinatorial gates are switched and no power dissipation occurs. 



5 



Referring to Fig. 2, a schematic block diagram illustrates issue slots UCi, UC2 



Refemng to Fig. 2, holdable registers 1 - 27 are provided directly at the data 



wo 03/088038 ^PCT/roO3/01366 

6 

Furthermore, to prevent result file indices RFI and register indices RI firom changing 
unnecessarily, and thereby causing unnecessary power dissipation, holdable registers 29, 31 
and 33 are placed directly after the time shape controller TSC. An advantage of this 
embodiment is that it reduces the pow^ consumption. In each clock cycle at most one 
S operation can be started on one of the functional units FUo, FUi and FU2, and most functional 
units finish their operation in a single processor cycle. If the inputs of the functional units, 
that are not being used, change due to data passed via the input routing network IKN or the 
decoder DEC, these functional units will consume comparable power to when they are not 
being used, oven though their output is irrelevant. Adding the holdable registers 1-33 
10 creates additional state, but that is irrelevant for the issue slots UCi, UC2 and UC3. During 
interrupts, their state only has to be firozen. The holdable registers 1 - 33 do only incur 
additional area. These registers do not waste additional power due to using clock gating to 
hold the registers in their inactive state in case tiie corresponding functional unit is not being 
used. 

15 Referring to Fig. 3, issue slot UQ) is the only issue slot that is used by the 

second instruction set, used in an interrupt service routine. In order to guarantee a fast 
interrupt response, it is crucial to minimize the amount of state that has to be saved during 
interrupt handling. This can be achieved by positioning the holdable registers at common 
inputs of the functional units FUo, FUi and FU2. Therefore, holdable registers 101, 103 and 

20 105 are put directly at the input of the issue slot UCo instead of at the data inputs of each 
functional unit FUo, FUi and FU2 in issue slot UCo. Furthermore, a holdable register 1 17 is 
put at the output of the decoder DEC for passing information of the type of operation OPT 
that has to be performed, instead of at the input of each functional imit FUo, FUi and FU2 in 
issue slot UCo. At the result file index input and register index input terminals of the time 

25 shape controller TSC holdable registers 113 and 1 15 are positioned as well, instead of at their 
outputs, saving one holdable register. The positioning of the holdable registers 107, 109 and 
1 1 1 at the input of each functional unit FUq, FUi and FU2 remains unchanged, since these 
functional unit inputs are not coupled to a common output of the decoder DEC. 

An advantage of tiie positioning of the holdable registers in issue slot UCq, is 

30 that the amoimt of state that has to be saved during an interrupt is strongly reduced, when 
compared to the amount of state present due to the holdable registers in the issue slots UCi, 
UC2 and UC3. The use of one functional unit FUo, FUi and FU2 in the issue slot UCo, results 
in changing inputs at the other functional imits of issue slot UCo and therefore causes 
uruaecessary power dissipation in this issue slot. In case the entire issue slot is not being used. 
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the holdable registers 101 - 11 1 and 117 will prevent power consumption by the functional 
units FUo, FUi and FU2 of issue slot UCq. 

For the issue slots UCo, UCi, UC2 and UC3, the location of the holdable 
registers results in a well balanced consideration between increasing performance, decreasing 
5 power consumption and reducing state oveihead. Many interrupts require very simple 
interrupt service routines and therefore only require a compact second instruction set that 
uses a limited second set of issue slots. In a large subset of the issue slots the holdable 
registers can be positioned as indicated in Fig. 2 to optimally reduce the power consumption, 
resulting in a significant overall reduction of the power consumption. The amount of state 

10 that has to be saved during interrupt handling is strongly reduced by positioning the holdable 
registers in the issue slots, used by the second instruction set, as indicated in Fig. 3. 
Furthermore, the holdable registers added to the issue slots UCo, UCi, UC2 and UC3 form an 
additional pipeline stage in the architecture, allowing the processor to run at a higher clock 
frequency. Referring again to Fig. 1, the holdable registers positioned in issue slots UCo, 

1 5 UCi , UC2 and UC3 divide the existing data path into two parts, decreasing the time needed to 
execute one part of the data path and allowing to increase the clock frequency of the 
processor. 

A superscalar processor also comprises multiple issue slots that can perform 
multiple operations in parallel, as in case of a VLIW processor. The principles of the 

20 embodiments for a VLIW processor, described in this section, therefore also apply for a 
superscalar processor. In general, a VLIW processor may have more issue slots when 
compared to a superscalar processor. The hardware of a VLIW processor is less complicated 
when compared to a superscalar processor, which results iu a better scalable architecture. The 
nimiber of issue slots and the nvunber of ftmctional xmits in each issue slot, among other 

25 things, will determine the relative decrease in power consumption due to the present 
invention. 

It should be noted that the above-mentioned embodiments illustrate rather than 
limit the invention, and that tiiose skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. In the claims, any 
30 reference signs placed between parentheses shall not be constraed as limiting the. claim. The 
word "comprising" does not exclude the presence of elements or steps other tiian those listed 
in a claim. The word "a" or "an" preceding an element does not exclude the presence of a 
plm^ty of such elements. In the device claim enumerating several means, several of these 
means can be embodied by one and the same item of hardware. The mere fact that certain 
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measmres are recited in mutually different dependent claims does not indicate that a 
combination of these measures camot be used to advantage. 
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CLAIMS: 



1 . A mtilti-issue processor comprising: 

a plxirality of issue slots, each one of the plxirality of issue slots comprising a 
plurality of functional units and a plurality of holdable registers, the plurality of issue slots 
comprising a first set of issue slots and a second set of issue slots; and 

a register file accessible by the plurality of issue slots; 
characterized in that a location of at least a part of the plurality of holdable registers in the 
first set of issue slots is different firom a location of at least a corresponding part of the 
plurality of holdable registers in the second set of issue slots. ^ 

2. A multi-issue processor according to Claim 1 comprising: 

a first instruction set means having access to at least the first set of issue slots; 
a second instruction set means having access to the second set of issue slots. 

3. A multi-issue processor according to Claim 1 or 2 wherein: 

in the first set of issue slots the location of the pluraUty of holdable data 
registers is at individual data inputs of the functional vmits, while in the second set of issue 
slots the location of the plurality of holdable data registers is at cormnon data inputs of the 
functional units. 
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